Optimizing Many-Threads-to-Many-Cores Mapping in Parallel Electronic System Level Simulation

نویسنده

Guantao Liu

چکیده

OF THE DISSERTATION Optimizing Many-Threads-to-Many-Cores Mapping in Parallel Electronic System Level Simulation By Guantao Liu Doctor of Philosophy in Computer Engineering University of California, Irvine, 2017 Professor Rainer Dömer, Chair In hardware/software codesign, Discrete Event Simulation (DES) has been in use for decades to verify and validate the functionality of Electronic System Level (ESL) models. Since the parallel computing platforms are readily available today, many Parallel Discrete Event Simulation (PDES) approaches are proposed to improve the simulation performance. However, as the thread parallelism increases in ESL designs and core count multiplies on multi-core and many-core platforms, thread-to-core mapping becomes critical in PDES. In this dissertation, we propose a computationand communication-aware approach to optimize thread mapping for parallel ESL simulation, with the aims of load balancing and communication minimization. As we identify that the order of dispatching parallel threads has a significant influence on the total simulation time, and Longest Job First (LJF) shows better performance than the Linux default thread dispatch policy, we first propose a segmentaware LJF scheduler for PDES. Our segment-aware scheduler can accurately predict the run time of the thread segments ahead, and thus make better dispatching decisions. Next, we define the concept of core distance for multi-core and many-core architectures, which quantifies core-to-core communication latency and characterizes processor hierarchies. For many-core architectures using directory-based cache coherence protocols, we observe that

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient parallelization of the genetic algorithm solution of traveling salesman problem on multi-core and many-core systems

Efficient parallelization of genetic algorithms (GAs) on state-of-the-art multi-threading or many-threading platforms is a challenge due to the difficulty of schedulation of hardware resources regarding the concurrency of threads. In this paper, for resolving the problem, a novel method is proposed, which parallelizes the GA by designing three concurrent kernels, each of which running some depe...

متن کامل

Clustering Cores for Parallel Thread Execution

In recent years, we have observed a strong trend towards using accelerators, such as GPUs, to speed up scientific applications. This results in a complex heterogeneous system in which traditional CPUs are used for the execution of sequential threads, while GPUs are used for accelerating parallel threads. Instead of following this trend, this paper introduces a new explicitly parallel instructio...

متن کامل

Effective cooperative scheduling of task-parallel applications on multiprogrammed parallel architectures

Emerging architecture designs include tens of processing cores on a single chip die; it is believed that the number of cores will reach the hundreds in not so many years from now. However, most common parallel workloads cannot fully utilize such systems. They expose fluctuating parallelism, and do not scale up indefinitely as there is usually a point after which synchronization costs outweigh t...

متن کامل

Multicore Performance Optimization Using Partner Cores

As the push for parallelism continues to increase the number of cores on a chip, and add to the complexity of system design, the task of optimizing performance at the application level becomes nearly impossible for the programmer. Much effort has been spent on developing techniques for optimizing performance at runtime, but many techniques for modern processors employ the use of speculative thr...

متن کامل

ParaWeaver: Performance Evaluation on Programming Models for Fine Grained Threads

There is a trend towards multicore or manycore processors in computer architecture design. In addition, several parallel programming models have been introduced. Some extract concurrent threads implicitly whenever possible, resulting in fine grained threads. Others construct threads by explicit user specifications in the program, resulting in coarse grained threads. How these two mechanisms imp...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Optimizing Many-Threads-to-Many-Cores Mapping in Parallel Electronic System Level Simulation

نویسنده

چکیده

منابع مشابه

Efficient parallelization of the genetic algorithm solution of traveling salesman problem on multi-core and many-core systems

Clustering Cores for Parallel Thread Execution

Effective cooperative scheduling of task-parallel applications on multiprogrammed parallel architectures

Multicore Performance Optimization Using Partner Cores

ParaWeaver: Performance Evaluation on Programming Models for Fine Grained Threads

عنوان ژورنال:

اشتراک گذاری